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CN . Abstract 
> 

1 This paper proposes a unifying variational approach for proving and extending some fundamental 

information theoretic inequalities. Fundamental information theory results such as maximization of 

o : 

differential entropy, minimization of Fisher information (Cramer-Rao inequality), worst additive noise 
lemma, entropy power inequality (EPI), and extremal entropy inequality (EEI) are interpreted as functional 
problems and proved within the framework of calculus of variations. Several applications and possible 
extensions of the proposed results are briefly mentioned. 
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I. Introduction 

In the information theory realm, it is well-known that given the second-order moment (or variance), a 
Gaussian density function maximizes the differential entropy. Similarly, given the second-order moment, 
the Gaussian density function minimizes the Fisher information, a result which is referred to as the 
Cramer-Rao inequality in the signal processing literature. Surprisingly, the proofs proposed in the literature 
for these fundamental results are relatively quite diverse, and no unifying feature exists. Since differential 
entropy or Fisher information is a functional with respect to a probability density function, the most 
natural way to establish these results is by approaching them from the perspective of functional analysis. 
Although some of these results have been dealt with partially or not all within the framework of calculus 
of variations, this paper presents a unifying variational framework to address these results as well as 
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numerous other fundamental information theoretic results. A number of challenging information theoretic 
inequalities such as the entropy power inequality (EPI) [lj and the extremal entropy inequality (EEI) 
(21 can be dealt with successfully in the proposed framework of functionals. Furthermore, the proposed 
variational calculus perspective presents usefulness in establishing other novel results and new extensions 
for the existing information theoretic inequalities. 

The main theme of this paper is to illustrate how some tools from calculus of variations can be used 
successfully to prove some of the fundamental information theoretic inequalities, which have been widely 
used in information theory and other fields. The proposed variational approach provides alternative proofs 
for some of the fundamental information theoretic inequalities and enables finding novel extensions of 
the existing results. However, more importantly is the fact that the proposed variational approach offers a 
potential guideline for finding the optimal solution for many other open problems. In addition, as a general 
feature, the proposed variational approach enables finding simpler solutions to some quite challenging 
results such as EPI or EEI. 

Variational calculus techniques have been used with great success in solving important problems 
in image processing and computer vision H such as image reconstruction (denoising, deblurring), 
inverse problems, and image segmentation. Recently, variational techniques were also advocated by 
Scutari and Palomar for optimization of multiuser communication systems 0-||7l, for deriving analytical 
wireless channel models using the maximum entropy principle when only limited information about 
the environment is available |8l- |[T0l and for designing optimal training sequences for radar and sonar 
applications |[TT1 - |fT3l . Maximum entropy principle found also applications in spectral estimation (e.g., 
Burg's maximum entropy spectral density estimator [1]) and Bayesian statistics |[T4l . 

The major results of this paper are enumerated as follows. First, using calculus of variations, the 
maximizing differential entropy and minimizing Fisher information theorems are proved under the clas- 
sical (standard) assumptions found in the literature as well as under a different set of assumptions. 
It is shown that a Gaussian density function maximizes the differential entropy but it minimizes the 
Fisher information, given the second-order moment. It is also shown that a half normal density function 
maximizes the differential entropy over the set of non-negative random variables, given the second-order 
moment. Furthermore, it is shown that a half normal density function minimizes the Fisher information 
over the set of non-negative random variables, provided that the regularity condition is ignored and 
the second-order moment is given. It is also shown that a chi density function minimizes the Fisher 
information over the set of non-negative random variables, under the assumption that the regularity 
condition holds and the second-order moment is given. 
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Second, a novel proof of the worst additive noise lemma EU1 is provided in the proposed functional 
framework. Previous proofs of the worst additive noise lemma were based on Jensen's inequality or data 
processing inequality EOl , Ell . Unlike the previous proofs, our approach is purely based on calculus of 
variations techniques, and both the scalar and vector versions of the lemma are treated. 

Third, EPI is proved based on calculus of variations. We first re-cast EPI into a functional problem. 
Then, the necessary optimal solutions for the functional problem are found using Euler's equation and 
exploiting the necessary conditions for the existence of the optimal solution for the considered functional 
problem. In a scalar version of EPI, the necessarily optimal solution, which is the Gaussian density 
function, is actually sufficient since only the Gaussian density function satisfies the Euler's equation. 
This is one of the main benefits using calculus of variations since it allows finding global optimal 
solutions simply by checking the set of solutions imposed by the set of necessary conditions. In a vector 
version of EPI, Euler's equation only shows that the Gaussian density functions are necessarily optimal, 
since the covariance matrices of the optimal solutions are not determined. However, this information 
alone-i.e., the fact that the optimal solutions are Gaussian-is enough to prove EPI. 

Finally, EEI is studied from the perspective of a functional problem. The main advantage of the 
proposed new proof is that neither the channel enhancement technique and EPI, used in Ej, nor the 
equality condition of data processing inequality and the technique based on the moment generating 
functions, adopted in (271, are required. Using the unified argument based on calculus of variations, EEI 
is simply proved herein paper. 

The rest of this paper is organized as follows. Some variational calculus preliminary results and their 
corollaries are first reviewed in Section ITT] Maximizing differential entropy theorem and minimizing 
Fisher information theorem (Cramer-Rao inequality) are proved in Section JII] In Section |lVl the worst 
additive noise lemma is introduced and proved based on calculus of variations. EPI and EEI are proved in 
Sections [V] and ED respectively. In Section IVlIl some applications of the addressed information theoretic 
inequalities are briefly mentioned. Finally, Section IVIIII concludes this paper. 

II. Some Preliminary Calculus of Variations Results 

In this section, we will review some of the fundamental results from variational calculus, and establish 
the concepts, notations and results that will be used constantly throughout the rest of the paper. These 
results are standard and therefore will be described briefly without further details. The readers might 
consult any book on calculus of variations such as ifHl . iTToTl . ifTTl . 
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Definition 1: A functional U[f x ] is denned as 



K{x,f x ,f' x )dx, 



(1) 



which is defined on the set of continuous functions. The function f x is assumed to have continuous 
first-order derivative in [a, b] and to satisfy the boundary conditions f x (a) = A x and f x (b) = B x . The 
functional K(-, •, •) is also assumed to have continuous first-order and second-order (partial) derivatives 
with respect to (wrt) all of its arguments. Also, notation f' x denotes the first-order derivative wit x. 
Definition 2: The increment of a functional U[f x ] is defined as 



AU[h x ] = U[f x + h x ]-U[f x ], 

where the function h x is the increment, and it is independent of the function f x . 
Definition 3: Suppose that, given f x , 

AU[h x ] = tp[h x ]+e\\h x \\, 



(2) 



(3) 



where ip [h x ] is a linear functional, e goes to zero as \\h x \\ approaches zero, and || • || denotes a norm 
and it is defined as 



E 

i=0 



max 

a<x<b 



(4) 



where f x \x) = (d l / dx l ) f x {x) , and summation upper index n varies depending on the normed lin- 
ear space considered (e.g., if the normed linear space consists of all continuous functions f x {x)- 
which have continuous first-order derivative-defined on an interval [a, b], \\f x \\ = max a < x <6 + 
max a <3.<6 |/^(x)|, and in this case n = 1). Then, the functional U [f x ] is said to be differentiable, and 
the major part of the increment tp [h x ] is called the (first-order) variation of the functional U [f x ] and it 
is expressed as 5U [f x ]. 

Based on Definitions [TJ |2j [3] and Taylor's theorem (see lTT5l ), the first-order and the second-order 
variations of a functional U [f x ] are expressed as 



SU [f x 
5 2 U [f x 



K' fx (x, f x ,f' x ) h x (x) + K' f , (x, f x J' x ) h' x (x) 



dx, 



(5) 



Kf xfx {x, hJ' x ) h x {xf + 2K'j xfx (x, f x J' x ) h x {x)h' x {x) 



+K'} xfx (xJ x ,f x ) h' x {xf 



dx 



K'l, f ,h' 2 + ( K'l* - ^-K'LfL ) h 



V f'xf'x x 



dx 



dx, 



(6) 
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where K'^ x and K'f, are the first-order partial derivatives wit f x and f' x , respectively, K'j x f, is the 
second-order partial derivative wrt f x and f' x , K'f x * is the second-order partial derivative wrt f x , and 
K'j, f, is the second-order partial derivative wrt f x \}_ 

Theorem 1 (VfTTl/): A necessary condition for the functional U[f x ] in (Q]) to have an extremum (or, 
local optimum) for a given function f x * is the following: 

SU[f x ,] = 0, (7) 

for all admissible h x . This implies 

K 'fx> - 1- K 'f . = °> < 8 > 

a result which is known as Euler's equation. When the functional in £T|) includes multiple functions (e.g., 
f Xl , . . . , f Xn ) and multiple integrals wrt xi, . . . , x n , then Euler's equation in ([8]) is changed to 

n j 

(9) 

In particular, when the functional does not depend on the first-order derivative of the functions / Xl , . . . , / Xn , 
the equation in (© is simplified to 

K' fx ,=0, i = l,...,n. (10) 

Proof: Details of the proof of this theorem can be found e.g., in [15]. ■ 

Theorem 2 (/Ti5l/).- A necessary condition for the functional U[f x ] in ([]]) to have a minimum for a 
given f x , is the following: 

S 2 U[f x ,}>0, (11) 

for all admissible h x . This implies 

K'L f , >0. (12) 

In particular, when the functional in (OQ) does not depend on the first-order derivative of the function f x , 
the equation in ([T2l changes into 

Kl» fx .>0. (13) 

'Throughout the paper, the arguments of functionals or functions are omitted unless the arguments are ambiguous or confusing. 



November 21, 2012 



DRAFT 



6 



When the functional in ([T]) includes multiple functions (e.g., f Xl , . . . , f Xn ) and multiple integrals wrt 
xi,...,x n , then the equation in ([131 ) is changed into the positive semi-definiteness of the following 
matrix: 



K'l f 

Jx 1 Jx 1 



K'i f 



Jx^Jx 



K'l , 

JX„JX 



> 0. 



(14) 



Proof: The inequality in (fT3T > is easily derived from the inequality in (fT2l since K'l, f , and K'l f , 
are vanishing in © when the functional in CO) does not depend on the first-order derivative of the function 
f x . Additional details of the proof can be found in Ifl5l . ■ 
Theorem 3 ( /I75l/): Given the functional 

rb 

U[f x J Y ] = / K(x,f x ,f Y J' x ,f' Y )dx, 

J a 

assume that the admissible functions satisfy the following conditions: 

f x (a) = A x , f x (b) = B x , f y (a)=A y , f y (b) = B y , 



(15) 



HxJxJy) = o, 

L[fx,f Y }= L(x,f x ,f Y ,f' x ,f Y )dx = l, 



(16) 
(17) 

where a, b, A x , B x , A Y , B Y , and I are constants, and U[f x ,f Y ] is assumed to have an extremum for 

fx = fx- and f Y = f Y *. 

If f x * and f Y * are not extremals of L[f x ,f Y ], or k' and k' do not vanish simultaneously at any 
point in (fT6l ), there exists a constant A or a function X(x) such that f x * and f Y » are extremals of the 
functional 

rb 



/( 

J a 



K{x,f x J Y ,f x ,f Y ) + XL(x,f x ,f Y ,f x ,f Y ) + X(x)k(x, f x ,f Y ) ) dx 

Based on Theorem [3l the following corollary is derived. 
Corollary 1: Given the functional 



(18) 



U[f x J Y 



K(x,y,f x ,f Y )dxdy, 



(19) 



assume that the admissible functions satisfy the following conditions: 

f x (a,a) = A x , f x (b,b) = B x , f Y (a) = A Y , f Y (b) = B Y , k(x, y, f x , f Y ) = 0, 

rb rb 



L[f x ,f Y 



L(x,y,f x ,f Y )dxdy = I, 



(20) 
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where a, b, A x , B x , A Y , and B Y are constants, f x is a function of both x and y, f Y is a function of y. 
The functional k(y,f x ,f Y ) is defined as g(y,f Y ) — k(x,y, f x )dx, where g(y,f Y ) is a functional of 
f Y and k(x, y, f x ) is a functional of / x . And, J7[/ x , / Y ] is assumed to have an extremum for f x = f x * 
and f Y = f Y *. 

Unless / x » and f Y * are extremals of L[f x ,f Y ], or k'^ and fey. y simultaneously vanish at any point 
of k(x,y, f x , f Y ), there exists a constant A or a function A(y) such that f x = f x * and f Y = f Y * is an 
extremal of the functional 

K(x,y,f x ,f Y ) + XL(x,y,f x ,f Y )-X(y)k(x,y,f x ) dx^j + X(y)g(y, jV)j dy. (21) 

Proof: This corollary is a simple extension of Theorem [3] for multiple integrals. Therefore, the 
detailed proof is omitted. ■ 

Based on Theorems CD E] and Corollary [T] we can derive the following corollary, which will be mainly 
used throughout this paper. 

Corollary 2: Based on the functional defined in (1211 ). the following necessary conditions are derived 
for the optimal solutions f x , and f Y ,: 



Kf x *(x,y,fx*,fY*) ~ \L' fx ,(x,y,f x *,f Y *) - X(y)k f fx ,(x,y,f x *) = 0, (22) 
Kf Y ,(x,y,f x .,fy.) - XL' fYt (x,y,f x ,,fy*)dx + X(y)g' fY „(y,fy*) = 0, (23) 

and the matrix 



/ 



(24) 



G" G" 
G" G" 
where the functional G is defined as 

G(x,y,f x *,fy) = K(x,y,f x *,f Y *) - XL(x,y, f x », f Y .) - X(y)k(x,y, f x .) + X(y)g(y, f Y *)q(x), 

and q(x) is a function which satisfies q(x)dx = 1, is positive definite. 

Proof: The equations in ((221 and (l23l are derived from the first-order variation condition in Theorem 
Q] Namely, the equations in (1221 and d23l are Euler's equations for multiple integrals. The positive 
definiteness of the matrix in d24l is derived from the second-order variation condition in Theorem [2] 
Namely, this is the same as the one in (fl4l . Since the proof is straightforward, the details of the proof 
are omitted here. ■ 
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III. MAX Entropy and MIN Fisher Information 



This simple but significant result-given the second-order moment (or variance) of a random variable, a 
Gaussian density function maximizes the differential entropy while it minimizes the Fisher information-is 
well-known. However, its complete rigorous proof can hardly be found. In this section, using calculus 
of variations, complete rigorous proofs will be provided. 

Theorem 4 ( /UJ/).' Given (the first-order) and the second-order moments of a random variable X, 
differential entropy of the random variable X is maximized when X is Gaussian, i.e., 



where h(-) denotes differential entropy, and X G is a Gaussian random variable whose (first-order) and 
second-order moments are identical to the one of X. 

Proof: In [1], the proof relies on calculus of variations to find the first-order necessary condition, 
which confirms necessary optimal solutions. However, the first-order necessary condition shows neither 
whether the solutions are local minimal or local maximal nor whether the solutions are locally optimal or 
globally optimal. Therefore, an additional technique, the Kullback-Leibler divergence, was used to prove 
that the necessary solution globally maximizes the differential entropy. Unlike this proof, by confirming 
both the first-order and the second-order necessary conditions, one can show that the optimal solution is 
a local maximal. Then, it can be shown that the local maximal is an actual global maximum achieving 
solution by proving that the local maximal is the only solution in the feasible set. Therefore, one can 
prove Theorem [4] solely based on calculus of variations arguments. See Appendix |A] for the details of 
the proof. 

Remark 1: Even though the proposed proof is performed assuming constraints on the first-order and 
the second-order moments, the constraint on the first-order moment is not necessary. This will be shown 
in the proof of Theorem |5J which is the vector version of this theorem. 

■ 

Similar to Theorem given a correlation matrix (or a covariance matrix), a multi-variate Gaussian 
density function maximizes the differential entropy as shown by the following theorem. 

Theorem 5 (7UJ/, 4271/): Given (a mean vector fi x ) and a correlation matrix ft x , a Gaussian random 
vector maximizes the differential entropy, i.e., 



where h(-) denotes differential entropy, X is an arbitrary but fixed random vector with the correlation 
matrix Q x , and X G is a Gaussian random vector whose correlation matrix is identical to the one of X. 



h(X) < h(X G ) 



(25) 



h(X) < h(Xa) 



(26) 
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Proof: See Appendix iBl 

Remark 2: The proposed proof is different from the ones mentioned in JT), ||2T1 in the sense that the 
proposed proof relies only on variational calculus techniques. Moreover, from the proposed proof, one 
can observe that the constraint related to the first-order moment is not necessary. 

Remark 3: Depending on the existence of the constraint related to the mean vector, the mean of 
the optimal Gaussian density function will change. However, the constraint on the mean vector is not 
necessarily required. Details of the proof are presented in Appendix |Bj 

■ 

If we only consider non-negative random variables, a Gaussian random variable is not the solution which 
maximizes the differential entropy. The following theorem shows that a half-normal random variable 
maximizes the differential entropy over the set of non-negative random variables. 

Theorem 6: Given an arbitrary but fixed non-negative random variable X and a half-normal random 
variable X HN , whose second moments are identical to those of X, then the following relationship holds: 

h(X) < h(X HN ), (27) 

where h(-) denotes differential entropy. 

Proof: See Appendix ICl ■ 
Similar to Theorems HI [5j and [6l we can find a probability density function, which minimizes the 
Fisher information. 

Theorem 7 (Cramer-Rao Inequality): Given (the first-order moment fi x ) and the second-order moment 
m 2 x , a Gaussian random variable X G minimizes Fisher information, i.e., 

J(X) > J(X G ), (28) 

where X is an arbitrary but fixed random variable with the first-order moment \x x and the second-order 
moment m x . Notation J(-) denotes the Fisher information, and it is defined as 

Proof: See Appendix 151 

Remark 4: Even though several proofs of this theorem have been proposed in the literature, this is the 
first rigorous proof of this theorem based on calculus of variations techniques. 

■ 

Theorem [7] can be generalized to random vectors as shown in the following theorem. 
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Theorem 8 ( Cramer-Rao Inequality ( a vector version ) ): Given an arbitrary but fixed random vector X 
and a Gaussian random vector X G , whose mean vectors and correlation matrices are identical, respectively, 



jj(x) y j(x G ), 

where JJ(-) denotes Fisher information matrix, and it is defined as 

Sll ■ ■ ■ Sin 

J(X)= : ••. ; 

Snl ' ' ' Snn 



(29) 



(30) 



3 »J 



f x (x)dx. 



/ x (x) J \ / x (x) 

Proof: See Appendix IE] ■ 
Similar to Theorem |7J a half-normal and a chi density function minimize the Fisher information over 

the set of non-negative random variables as shown in the following two theorems. 

Theorem 9: Assume that the regularity condition for Fisher information is ignored. Given an arbitrary 

but fixed non-negative random variable X and a half-normal random variable X HN , whose second order 

moments are identical to those of X, then the following inequality holds: 

J(X) > J(X HN ), (31) 
where J(-) denotes Fisher information. The regularity condition is the following relationship: 

J ^f(x)dx = 0. (32) 

Proof: See Appendix [F] ■ 
Theorem 10 (KWil): Assume next that random variables, which satisfy the regularity condition in (l32l . 
are considered. Given an arbitrary but fixed non-negative random variable X and a chi-distributed random 
variable X c , whose second-order moments are identical to those of X, then the following inequality holds: 

J(X) > J(X ), (33) 

where J(-) stands for the Fisher information. 

Proof: Unlike the proof in lfT8l . by considering the first-order and the second-order moments instead 
of variance, we obtain the convex constraint sets. Since Fisher information is a strictly convex functional 
with respect to a probability density function, the variational problem is convex, and hence has an unique 
solution. The details of the proof are deferred to Appendix [Gj ■ 
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IV. Worst Additive Noise Lemma 

Worst additive noise lemma was introduced and exploited in several references GUI . Ell . ll23l . and it 
has been widely used in numerous applications. One of the main applications of the worst additive noise 
lemma pertains to the calculation of channel capacity under several different wireless communications 
scenarios such as the Gaussian MIMO broadcasting channel, Gaussian MIMO wire-tap channel, etc. In 
this section, the worst additive noise lemma for both random variables and random vectors will be proved 
solely based on calculus of variations arguments. 

Theorem 11: Assume X is an arbitrary but fixed random variable and X G is a Gaussian random 
variable, whose second-order moment is identical to that of X, which it is denoted as m x . Given a 
Gaussian random variable W G , which is independent of both X and X a , with the second-order moment 
m^, then the following relationship holds: 

I(X + W G ;W G ) >I(X G + W G ;W G ), (34) 

where /(■;•) denotes mutual information. 

Proof: The details of the proof are deferred to Appendix |H] ■ 

Similarly, Theorem [TT] can be generalized to random vectors as shown in the following theorem. 

Theorem 12: Assume X is an arbitrary but fixed random vector and X G is a Gaussian random vector, 
whose correlation matrix is identical to that of X, and it is denoted as Q x . Given a Gaussian random 
vector W G , which is independent of both X and X G , with the correlation matrix fl w , then the following 
relation holds: 

/(X + W G ;W G ) >/(X G + W G ;W G ). (35) 

Proof: Our novel proof is entirely based on calculus of variations arguments. The summary of 
our proof is the following. First, we construct a variational problem, which represents the inequality in 
(l35l) and required constraints in a functional form. Second, using the first-order variation condition, we 
find necessary optimal solutions, which satisfy Euler's equation. Third, using the second-order variation 
condition, we show that the optimal solutions are necessarily local minima. Finally, we prove that the 
local minimum is also global. The details of the proof are presented to Appendix HI ■ 

V. Entropy Power Inequality 

Entropy power inequality (EPI) is a powerful result that found applicability in determining the capacity 
of scalar Gaussian broadcast channel 11241 . the capacity of Gaussian MIMO broadcast channel EJ, lfl9l . 
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the secrecy capacity of Gaussian wire-tap channel 11251 . ll27l . etc., in conjunction with Fano's inequality 
and additional techniques such as the ones proposed in 11191 . ll27l . In this section, we will prove several 
versions of EPI using calculus of variations techniques. 

Theorem 13 (Entropy Power Inequality): For two independent random variables X and W, whose 
entropies and second-order moments are finite, 

h{a x X + a w W) > a 2 x h(X) + a 2 w h(W), (36) 

where a x + a 2 v = 1. The equality holds if and only if X and W are Gaussian random variables. 

Proof: See Appendix [I] ■ 
Theorem 14 (Entropy Power Inequality): For two independent random vectors X and W, with finite 
entropies and correlation matrices, the following relation holds: 

h(a x X + a w W) > a 2 x h{X) + a 2 w h(W), (37) 

where a 2 x + c? w = 1. The equality holds if and only if X and W are Gaussian random vectors and their 
covariance matrices S x and S w are identical. 

Proof: See Appendix iKl ■ 

VI. Extremal Entropy Inequality 

Extremal entropy inequality, motivated by multi-terminal information theoretic problems such as the 
vector Gaussian broadcast channel and the distributed source coding with a single quadratic distortion 
constraint, was proposed by Liu and Viswanath [2]. It is an entropy power inequality which includes a 
covariance constraint. Because of the covariance constraint, the extremal entropy inequality could not be 
proved directly by using the classical EPI. Therefore, new techniques ( |fl9l , E71 ) were adopted in the 
proofs reported in |2), f27l . In this section, the extremal entropy inequality will be proved using calculus 
of variations. 

Theorem 15: Assume that \i is an arbitrary but fixed constant, where p, > 1, and r 2 is a positive 
constant. A Gaussian random variable W G with variance is assumed to be independent of an arbitrary 
random variable X, with variance a\ < r 2 . Then, there exists a Gaussian random variable X* with 
variance a 2 , which satisfies the following inequality: 

h(X) - fxh(X + W G ) < h(X*) - fih{X* + W G ), (38) 

where a 2 x , < r 2 . 

Proof: See Appendix [L] ■ 
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Theorem [15] can be generalized for random vectors as shown in the following two theorems. 

Theorem 16: Assume that fi is an arbitrary but fixed constant, where fj, > 1, and 5] is a positive semi- 
definite matrix. A Gaussian random vector W G with positive definite covariance matrix S w is assumed to 
be independent of an arbitrary random vector X whose covariance matrix S x satisfies S x ^ S. Then, 
there exists a Gaussian random vector X* with covariance matrix 5] x » which satisfies the following 
inequality: 

h(X) - M(X + W G ) < h(X* G ) - fih(X* G + W G ), (39) 

where X x , ^ S. 

Proof: See Appendix IM1 ■ 
Remark 5: As the extremal entropy inequality only shows the existence of necessary optimal solutions 
in @ and (271 . the current proof also shows the existence of necessary optimal solutions. In addition, 
the proposed proof only exploits calculus of variations tools. Namely, this proof does not adopt neither 
the channel enhancement technique and EPI in (2j nor the EPI and data processing inequality in [27]. 

Theorem 17: Assume that /j, is an arbitrary but fixed constant, with fj, > 1, and £ is a positive semi- 
defmite matrix. Independent Gaussian random vectors W G with covariance matrix S w and V G with 
covariance matrix £ v are assumed to be independent of an arbitrary random vector X with covariance 
matrix S x < £. Both covariance matrices £ w and £ v are assumed to be positive definite. Then, there 
exists a Gaussian random vector X* with covariance matrix £x* which satisfies the following inequality: 

h(X + W G ) - fih(X + V G ) < h(X* G + W G ) - fih{X* G + V G ), (40) 

where Hx* ^ 

Proof: See Appendix INI ■ 
Remark 6: The proposed proof does not borrow any techniques from [2j. Even though the proposed 
proof adopts the equality condition for the data processing inequality, a result which was also exploited 
in (271, the proposed proof is different from the one in (27l from the following perspectives. First, the 
proposed proof uses the equality condition of the data processing inequality only once while the proof 
in ll27l uses it twice. The proof in (H exploited the channel enhancement technique twice, which is 
equivalent to using the equality condition in the data processing inequality. Second, the proposed proof 
does not use the moment generating function technique unlike the proof proposed in (271 : instead the 
current proof directly exploits a property of the conditional mutual information pertaining to a Markov 
chain. 
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VII. Applications 

The importance of information theoretic inequalities such as entropy power inequality, extremal entropy 
inequality, etc., was already proved by several applications. For example, minimum Fisher information 
theorem (Cramer-Rao inequality) and maximum entropy theorem were used for developing min-max 
robust estimation techniques EH, ll32l - EPI was first adapted to prove a lower bound on the capacity of 
additive noise channels by Shannon 11281 . and received huge interest recently |[2ll . |29l . Also, EPI was 
exploited for the scalar Gaussian broadcast channel |[24ll . the scalar quadratic Gaussian CEO problem 
|[30ll . etc. The extremal entropy inequality can be used in the vector Gaussian broadcast channel |2), 
the distributed source coding with a single quadratic distortion constraint problem |2], and the Gaussian 
wire-tap channel E71 , and so on. Even though these applications were traditionally addressed using 
the above mentioned information theoretic inequalities, one can directly approach these applications by 
means of variational calculus techniques. Numerous extensions of maximum entropy theorem, minimum 
Fisher information theorem, additive worst noise lemma, entropy power inequality and extremal entropy 
inequality might be envisioned within the proposed variational calculus framework by imposing various 
restrictions on the range of values assumed by random variables/vectors (e.g., random variables whose 
support is limited to a finite length interval) or on their second or higher-order moments and correlations. 
For example, the problem of finding the worst additive noise under a covariance constraint [20] as well as 
establishing multivariate extensions of Costa's entropy power inequality ll34l along the lines mentioned by 
Liu et al. [26] and Palomar ||35l , IT361 might be also addressed within the proposed variational framework. 

VIII. Conclusions 

In this paper, we derived several fundamental information theoretic inequalities using a functional 
analysis framework. The main benefit for employing calculus of variations for proving information 
theoretic inequalities is the fact that the global optimal solution is obtained from the necessary conditions 
for optimality without additional calculations. The summary of our contributions is the following. First, 
the entropy maximizing theorem and Fisher information minimizing theorem were derived under different 
assumptions. Second, the worst additive noise lemma was proved from the perspective of a functional 
problem. Third, the entropy power inequality and the extremal entropy inequality were derived using 
calculus of variations. Finally, applications and possible extensions that could be addressed based on the 
proposed results were briefly mentioned. 
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Appendix A 
Proof of Theorem @] 

Proof: To prove the inequality in (l25l) . we first construct a functional problem as follows: 

min / f x (x) log f x (x)dx, 

fx J 

s. t. j f x (x)dx = 1, 

xf x (x)dx = fj, x , 
x 2 f x (x)dx = m 2 x , 



where fj, x is the first-order moment of X, and m x represents the second-order moment of X. 
Using Theorem [3l the functional problem in (|4TT ) is expressed as 

min U[f x ], 

fx 

where U[f x ] = J K(x,f x )dx, K(x,f x ) = f x (x) (log/ x (x) + a + a%x + a 2 x 2 ), a , a u and 
Lagrange multipliers. 

The optimal density function f x * must satisfy the first-order variation condition as follows: 

= 1 + log/ x »(x) + a + ol\x + a 2 x 2 = 0. 



K' — —K' f , 

fx dx fx 



}x—f x * 

Considering the constraints in (I42~l)-d43l) and the equation in d45l ), it follows that 



fx* (x) = -, exp 



2n^r { 2 " 



2r(^S) 2 }^£ exp {- a °- 1 + i} 



; ex P { — 7T7 o~ 5T i x ~ Vxf } , 



where 



u 2 1 

"o = -l + ^fSr + -log2iK-^ 



2(m|- M 2) 2 



1 

a 2 



2(m 2 x -fi 2 x )' 

Since the second-order variation of U[f x ] is expressed as 

K't f = - 

fxfx /*=/*. fA^Y 

and it is positive, the optimal solution f x , minimizes the variational problem in (|4TT ). 
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These first-order and second-order conditions are not sufficient but necessary for the optimal solution. 
However, as shown in (|45T ) and (l46l ). there exists only one solution, the Gaussian density function, in the 
feasible set. Therefore, the Gaussian density function is also sufficient in this case. 

Therefore, a negative differential entropy —h(X) is minimized (or, equivalently h(X) is maximized) 
when f x (x) is Gaussian, and the proof is completed. ■ 



Appendix B 
Proof of Theorem [5] 

Proof: We first construct a functional problem, which represents the inequality in (126*1 ) and required 
constraints, as follows: 



min / / x (x)log/ x (x)dx, 

fx J 

s. t. J f x (x)dx = l, 

J xx T /x(x)cbc = ft x . 
Using Theorem the functional problem in d49l is expressed as 



(49) 
(50) 
(51) 

(52) 

where U[f x ] = J K(x, f x )dx = J / x (x) [log / x (x) + a + £)™ =1 YJj=\ ^ijXi x j) dx > and « and ^ij are 
Lagrange multipliers. 

Based on Theorem Q] or Corollary |2j by checking the first-order variation condition, we can find the 
optimal solution f x * (x) as follows. 



min U[f x ], 

Jx 



K' f -—K'f, 

Jx dx Ix 



fx— fx 



1 + log f x , (x) + a + x T Ax = 0, 



(53) 
(54) 



Considering the constraints in (1501) and (1511) . 

/ x »(x) = exp {— x T Ax — a — 1} 



(27T)" 



1 



-A 



-i 



1 . fl 



ex PS --x- .-A 



-i 



xW27r): 



A 



exp {—1 — a} 



(2tt)-» |n x p5 exp <{ --x^x }> , 



(55) 
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where 

a = -l + ilog(27T) n |r2 x [, 

A = (56) 

Here, two remarks are in order. First, the correlation matrix Cl x is assumed to be invertible. When the 
correlation matrix is non-invertible, similar to the method shown in 0, we can equivalently re-write the 
functional problem in (|49l ) and its constraints in (1511 ) as 

min / /jf(x)log/jj(x)dx, (57) 

Jx J 

s. t. ff x (x)dx = l, 

[ XX T f x ( X )dx = n x , (58) 

where X is a random vector with correlation matrix Q x , and Sl^ is a positive definite matrix. Therefore, 
without loss of generality, we assume the correlation matrix Q x is invertible. Second, if an additional 
constraint, related to the mean vector of X, /x x , is given, the optimal solution is a multi-variate Gaussian 
density function, whose mean is [i x , instead of the multi-variate Gaussian density function, which has 
zero mean, in (l55l) (cf. Appendix lAi 
Since 

1 



K'i f 



>0, 



fx=fx* /x*(x) 

the second-order variation 5 2 U [/ x »] is positive, and the optimal solution f x . is a minimal solution for 
the variational problem in ( |49l ). 

Therefore, a differential entropy — /i(X) is minimized (or, equivalently h(X.) is maximized) when X 
is a multi-variate Gaussian random vector with zero mean and a co variance matrix S x . Even though 
Theorems [T] [2] are necessary conditions for the minimum, in this case, a multi-variate Gaussian density 
function is an actual solution since there is only one solution, a multi-variate Gaussian density function, 
in the feasible set. ■ 
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Appendix C 
Proof of Theorem [6] 

Proof: We first construct a functional problem, which represents the inequality in (|27T ) and required 
constraints, as follows: 

POO 

mm / f x (x)log f x (x)dx, (59) 
fx Jo 

POO 

s. t. / f x (x)dx = l, 

JO 

x 2 f x (x)dx = m 2 . (60) 
Using Theorem |3l the functional problem in d59l is expressed as 



o 



min U[f x ], (61) 

fx 

where J7[/J = / K(x, f x )dx, K(x, f x ) = f x (x) (log f x (x) + a + aix 2 ), and a and ai are Lagrange 
multipliers c 

Based on Theorem [T] or Corollary |2j the first-order variation condition of is considered as 

follows. 



K' -4-K'f, 

tx dx f * 



= 1 + log f x . (x) + q + aix 2 = 0. (62) 

fx— fx* 



Considering the constraints in (1601) and the equation in 

fx*(x) = 



±= exp | -^-x 2 1 y^exp {-a - 1} 



4a i 

i r i 



where 



exp <j -tt^-x 2 } , x>0, (63) 



1 vrm 2 
a = — l + -log 



2 fa 2 ' 
1 

1 2m 2 



Since 



fx=f x * fx*(x) 



1 

>o, 



2t 



For the simplicity of notations, the range of integration will not be explicitly expressed in the rest of this proof. Throughout 
the paper, the range of integration will not be explicitly denoted unless the range is ambiguous. 
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and the second-order variation 5 2 U[f x +] > 0, the optimal solution f x * is a minimal solution for the 
variational problem in (f59l ). 

These first-order and second-order conditions are not sufficient but necessary for the optimal. However, 
as shown in (l62l ) and (l63l ). there exists only one solution, a half-normal density function, in the feasible 
set. Therefore, a half-normal density function is also sufficient in this problem. 

Therefore, given the second-order moment, the negative differential entropy — h(X) is minimized (or, 
equivalently h(X) is maximized) over the set of non-negative random variables when f x (x) is a half- 
normal density function. 

Remark 7: Since a half-normal random variable has a fixed mean, if we add a constraint of the mean 
such as E x [X] = \i x in (l60b . the inequality in d27l) is not true except when \i x = ^2m x /ir, where fi x 
and m x are the first-order moment and the second-order moment of X, respectively. 



Appendix D 
Proof of Theorem [7] 

Proof: We first construct a functional problem, which represents the inequality in (l28l) and required 
constraints, as follows: 

min / f -j^S-dx, (64) 

fx J f x {x) 

s. t. I f x (x)dx = 1, 

xf x (x)dx = fj, x , 

x 2 f x (x)dx = m x . (65) 
Using Theorem |3j the functional problem in d64l is expressed as 

min U[f x ], (66) 

fx 

where U[f x ] = J K{x, f x , f x )dx, K(x,f x ,f' x ) = (f' x (x) 2 /f x (x)) + f x (x) (a + a 1 x + a 2 x 2 ), and a , 
a\, and «2 are the Lagrange multipliers. 

Based on Theorem [T] or Corollary |2j the first-order variation is investigated as follows: 



K' f -4-K'r 

tx dx f * 



fx=f x * \f x ,(x)J f x ,( X ) 



f (x)\ 2 f "(x) 

2 J * + a + aix + a 2 x 2 = 0. (67) 
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Unlike Theorem |U we cannot directly calculate f x *(x) from the equation in (I67T ). Fortunately, when 
f x *(x) is a Gaussian density function, (f' x , (x)/f x * (x)) 2 — 2(f"*(x)/f x *(x)) in (l67l) is expressed as a 
quadratic function, which is similar to the quadratic parts in (l67l ). 

Due to the constraints in (l65l) . a Gaussian density function f x , (x) is defined as 

f x .{x) = — = 1 = exp{- 1 — (x-n x ) 2 \. (68) 

-T/4) I 2(774-/4) J 

By substituting f x *{x) in (l68l for the equation in (I67T ). 



— ^— — ^-(a;-/j x )] -2<( ^— — T (x-it x )\ 5— — t- > + a + aire + a 2 x 

m* -ni J V mi -ni J mi- Hi \ 



2 



1 2 2/J x / /j : 



,2 



x + ~, — n TTToX + -—— — n H 5 5- + a + aix + a 2 :c 



.,2 



(7772-^)2 ( m 2 x _^2)2 ^ {m% - H%f ^~/4. 

0. (69) 



Since the equations in d69l must be satisfied for any x, 

A 2 



Of) 



Oil 



(rni-H 2 x y rni-H 
(m| -/4) 2 ' 



«2 = - - 1 2 . (70) 
(mj -/4) 



Since 



K "r,r, =2 t^tt >0 (71) 

and the second-order variation 5 2 U[f x *] is positive, the optimal solution f x , minimizes the variational 
problem in (l64l ). 

Therefore, Fisher information J(X) is minimized when fx(x) is Gaussian. Even though Theorems 
[TJ |2] are necessary conditions for the minimum, in this case, a Gaussian density function is sufficiently 
optimal due to the following fact: the objective function is strictly convex and the constraint sets are 
convex. Therefore, the proof is completed. 

Remark 8: Even though this result is well-known in the literature (e.g., |fT8l , Ell ), this is the first 
rigorous proof based on calculus of variations. 

Remark 9: The constraint related to the first-order moment in d65T ), is not required in this case. Without 
the constraint, the optimal solution is a Gaussian density function, which has zero mean. 
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Appendix E 
Proof of Theorem [8] 

Proof: We first construct a functional problem, which represents the inequality in 
required constraints as follows: 



and the 



(72) 



(73) 



min / rV/*(x)V/*(xr$— ^dx, 
ix j jx v x J 

s. t. j f x (x)dx = 1, 

J x/ x (x)dx = v x , 
J xx T f x (x)dx = fl x , 

where £ is an arbitrary but fixed non-zero vector, and it is defined as £ = . . . , £ n ] T . 
Using Theorem [3l the functional problem in (l72l is expressed as 

mm U[f x ], 

fx 

where [7[/ x ] = / K(x, f x ,Vf x )dx, K(x, f x ,Vf x ) = (^ T V/ x (x)V/ x (x)^// x (x))+/ x (x) £? =1 Ci*i+ 
a/ x (x) + / x (x) ]T™ =1 E"=i ^ijXiXj, and a, £i. and are Lagrange multipliers. 

Based on Theorem [T] or |2j by confirming the first-order variation condition, i.e., SU[f x »] = 0, we can 
find the optimal solution f x » (x) as follows. 



(74) 



ix q x h 



i=i 



(75) 



fx=f x 



where 



K' f , 



dxi fx 



fV/,(x)V/ x (x) T { 

/x(x) 2 

/ 2 £ T&fxWtej 



_d_ 

dxi 



+ a + C T x + x T Ax, 



/x(x) 



V 



/ 



/x(x) 



/ X (X) 



(76) 
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Therefore, the left-hand side of the equation in (|75T ) is expressed as 

d 



fx Q x . fx, 



i=l 



3 3 3 3 



i=ij=i 



i=ii=i 



/*( x ) 5 



/*( X ) 



+ a + E + X] X] ^ii^i^ 



i=l 



t=l 3=1 



0. 



(77) 
(78) 



Unlike Theorem[51 we cannot directly calculate f x , (x) from the equation in (l75l) . Fortunately, the first 
two parts in equation (1771 ) are expressed as a quadratic function when f x * (x) is a multi-variate Gaussian 
density function, and therefore, the multi-variate Gaussian density function satisfies the equality in d78l . 
When f x * (x) is a multi-variate Gaussian density function: 

/ x .(x) = (2vr)-t|I] x rie X p|-i(x-/i x ) T S; 1 (x- / i x )|, 

where S x = fl x - ^ x ii T x , 



S- 1 



0"t 



(79) 



its partial derivatives are expressed as follows: 

a 



dxi 

d d 
dxj dxi 



/x.(x) 
/x.(x) 



1 ( - - \ 

\/=l m=l / 

1 1 / n ™ 

2 + <J /*• (*) + 4 ( E < - ^) + E ^ 

\i=l m=l 



(80) 



vZ = l 



?Ti=l 



Without loss of generality, the covariance matrix S x is assumed to be invertible due to the same reason 
mentioned in Appendix iBl 
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By substituting the equations in (l80l) into the equations (1771) . it turns out that 

K 'f *-Y-^- K 'r 

lx ^ dxi } K 

i=l 



iEE^- E«+<)(^-^) EO 

j=l j=l \j=l / \m=l 

n n n n n 

+ E E (< + 4.) + « + E^ + EE A 



i=i j=i 

n n 

EE 

1=1 m=l 

n n 



1 + a x ) (x m - fi Xn 



ijX{Xj 



i=l i=l j=l 

n n 



{xi - Mx,) (x m - fi x j j ^ E E ( a *u + a *n) { a % m + °^ 



i=l j=l 

n n n 



+ E E « + <J + a + E^ + EE A < 



IjXiXj 



i=l j=l i=l i=l J=l 

n n n n 

E E - m*,) (a™ - ^jfM + E E (< + <,) 6$ 

i=l m=l i=l j'=l 

n n n 

+a + E + E E A *i x * x i 

i=l i=l j'=l 

n n n n n n n 

Y E uim ( xi ~ ( Xm ~ p***) + E E \ a % + + a + E c - iXi + E E 

i=l m=l i=l j=l i=l i=l j=l 

(x - n x ) T (x - /xj + £ T *£ + a + C T x + x T Ax 
(x T fix + x T Ax) + (C T x - 2 t x T x Clx) + + <T*£ + « 



0. 



(81) 
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where 



. . . 

V'n • • • 



2 2 



A 



An 

A n ,l 

wii • 



Air 

A nr 

^ln 



2 , 2 



2a 



l,...,n, j = 1, ...,71, / = l,...,n, m = l,...,n, 
l,...,n, j = l,...,n, 



/ = 1 



n, m = 1, 



, n. 



(82) 



Therefore, the Lagrange multipliers a and are defined as 

a = -/4ft/i x - £ T *£, 
C = 2ft Mx , 
A = -fl. 

Since the second-order variation condition is positive 



K", f , 

fx fx 



1 



>0, 



(83) 



(84) 



the optimal solution f x , (x) minimizes the variational problem in d72l) . Therefore, the Fisher information 
matrix J(X) is minimized when / x *(x) is a multi-variate Gaussian, i.e., JT(X) ^ J(X G ). Even though 
Theorems \T\ [2] are necessary conditions for the minimum, in this case, the multi-variate Gaussian density 
function is sufficiently the global minimum solution since the objective function is strictly convex and 
its constraint sets are convex. ■ 
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Appendix F 
Proof of Theorem [9] 

Proof: We first construct a functional problem, which represents the inequality in (l3Tb and required 
constraints, as follows: 

f°° f (x) 2 

POO 

s. t. / f x (x)dx = l, 
Jo 

x 2 f x (x)dx = m 2 x . (86) 
Using Theorem |3l the functional problem in (I85T ) is expressed as 

min U[f x ], (87) 

fx 

where U[f x ] = j K(x, f x , f' x )dx, K(x,f x ,f' x ) = (f' x (x) 2 /f x (x)) + f x (x) (ao + ^x 2 ), and a and 
ot\ are the Lagrange multipliers. 

Based on Theorem Q] or [2j the first-order and the second-order variation conditions of U[f x ] will be 
considered as follows. First, the optimal solution f x *(x) must satisfy the following first-order variation 
condition: 



K' f -4-K'r 

fx dx fx 



2 J f) '- + a + aix 2 = 0. (88) 



fx=fx* \fx*(x)J f x *{x) 

When f x *(x) is a half-normal density function, (f' x , (x)/f x * (x)) 2 — 2(f x »(x)/f x *(x)) in d88l ) is 
expressed as a quadratic function, and therefore the equation in (l88l ) can be satisfied. 



Considering the constraints in (l86l ) and f x *(x) = 7rm x /2) exp(— x 2 / (2m 2 ,))), where x > 0, 



1 V \( 1 V 1 I 2 

^-x - 2 < T x T } + a + ot\x 

mi J {\ m z x J m 2 x j 

= — \-x 2 + — t? + a + a\x 2 
m x mi 

= 0. (89) 
Since the equation in d89l is satisfied for any x, 

2 

«o = a", 

«l = ^4- (90) 
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Now, the second-order variation condition is considered as follows. Since 

„ 1 



K'L f , 

fx fx 



> 0, (91) 



fx=fx* fx>(%) 

the second-order variation of 5 2 U[f x *] > 0, and therefore f x * minimizes the variational problem in (f33T >- 
Therefore, the Fisher information J(X) is minimized when f x {x) is half normal. Even though Theorems 
[U [2] are necessary conditions for the minimum, in this case, a half normal density function is sufficiently 
optimal due to the strict convexity of the objective function and the convexity of the constraint set in 
(185T ) and (l86l ). Therefore, the proof is completed. ■ 



Appendix G 
Proof of Theorem [Tol 

Proof: We first construct a functional problem, which represents the inequality in (l33l) and required 
constraints, as follows: 

r f (x) 2 

min / J *\ \ dx, (92) 



fx J fx(x) 

s. t. J f x (x)dx 



x 2 f x {x)dx = m 2 x . (93) 
Using Theorem [3j the functional problem in d92~l ) is expressed as 

min U[f x ], (94) 

fx 

where U[f x ] = J K(x, f x , f' x )dx, K(x,f x ,f x ) = {f' x { x f I fx{x)) + f x (x) (a + aia; 2 ), and a and 
a\ are the Lagrange multipliers. 

Based on Theorem Q] or Corollary |2l by confirming the first-order variation condition, the optimal 
solution fx- {x) can be found as follows: 



ix dx ix 





(f'Ax)\ 


fx— fx* 


\fx*{x)) 



f" (x) 

2^f4 + «o + a lX 2 = 0. (95) 

fx* (X) 



Unfortunately, we cannot directly calculate f x * (x) from the equation in (I93T ). Instead, we try to 
search density functions which satisfy the equation in (l95l >. The first two parts, (f x ,(x)/f x *(x)) 2 — 
2(/"« (x)/f x * (x)), in equation (I93T ) are expressed as a quadratic function when f x *{x) is a chi density 
function with 3 degrees of freedom. Therefore, the chi density function satisfies the equation in 
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Considering the constraints in d93l ) and denning f x *(x) as ^jlfna 3 x 2 exp(— x 2 /{2a 2 )), where a 
\J m x 1 3, the equation in (|95T ) is expressed as 



2 i\ 2 /i 2 2 5 



2 I -t H — 9 T- + a o + «i£ 



x a 2 / V a 4 £ 2 a 2 

1 2 6 2 
— jx +^ + a + aix 
a 4 or 



(96) 



Since the equation in d96l ) must be satisfied for any x, 



6 18 

"0 — n — n", 

a z m z 

ai = \ = ( 4") • (97) 

Now, using the second-order variation condition, we will confirm that the optimal solution f x * actually 
minimizes the variational problem in d92l as shown in the following equation: 

_ 1 



J xJ x 



> 0. (98) 



Sx=fx* fx'ix) 

Therefore, the Fisher information J(X) is minimized when f x {x) is a chi density function with 
3 degrees of freedom and the second-order moment m 2 . Even though Theorems [T] [2] are necessary 
conditions for the minimum, in this case, the chi density function is sufficiently minimum since the 
variational problem in d92l is strictly convex and the constraint set in (|93l ) is convex. Therefore, the proof 
is completed. 

Remark 10: Both a half normal density function and a chi-density function satisfy Euler's equation. 
Therefore, these two functions are the optimal solutions which minimize Fisher information for non- 
negative random variables. However, a half normal density function does not obey the regularity condition 
for Fisher information while a chi density function satisfies the regularity condition. 
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Appendix H 
Proof of Theorem ITTI 

Proof: To prove the inequality in (l34l) . the following functional problem is constructed: 

dxdy (99) 



mm 

fx 



J J fx(x)f Ylx (y\x) -log (J fx(x)f Yix (y\x)dxj +log/ x (z) 



s. t. y f x (x)dx = 1, 

/" x 2 f x (x)dx = m 2 . (100) 
After substituting the random variable F for X + W G , its density function is expressed as 

fAv) = J fx{x)f Ylx {y\x)dx 

= f fx(x)f w (y-x)dx. (101) 

Then, the problem in d99l and its constraints in (1 100b are expressed as 

mm / / f x [x)f w (y - x) [- log f Y (y) + log f x (x)] dxdy (102) 
fxjy J J 

s- t / / f x (x)f w (y - x)dxdy = 1, 

x 2 f x (x)f w (y - x)dxdy = m 2 x , 

2 f C„\j„, 2 



2/ f Y {x)dy = m Y , 

fAv) = y f x (x)f w (y-x)dx. (103) 
Using Lagrange multipliers, the functional problem in (11021 ) is denoted as 

mm / / f x (x)f w (y - x) [- log f Y (y) + log f x (x) + q + aia; 2 - X(y)] dx 

fx,fy J \ J 

+fAy)[u2y 2 + \(y)]^Jdy. (104) 

Define a functional U as 

= y (y K(x,y,f x J Y )cbJ +K(y,f Y )dy, (105) 

where|3if(a; ) 2/,/ x ,/ y ) = f x (x)f w (y - x)[- log + log + q + aix 2 - A(y)], and K(y,f Y ) = 
fAv) [« 2 y 2 + A(y)]. 

3 The equation in l |105t is denoted as J (J Kdx) + Kdy for the simplicity of notation. 
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Now, we have to find f x * and f Y » which satisfy the first-order variation condition, 5U = 0. 



K' 



K' f dx + K' f 



fx— fx* Jv—fy 



fw(y - x) (-log f Y ,(y) +log/ x .(x) + a + a x x 2 + 1 - X(y)) 



fx— fx* Jy—fy* 



fx'{x)f w (y - x)dx + a 2 y 2 + \{y) 

f Y *(y) 



o. 



Since the equations in (11061 ) and (1107b are satisfied for any x and y, 

- log fy* (y) + c Y - \(y) = 0, 

log fx* (x) + a + a\x 2 + 1 - c y . = 0, 

Kv) = 1 - a 2V 2 , 

where c Y is a constant. 
Therefore, 

/ x . (x) = exp (-a - aix 2 - 1 + c Y ) , 
fy (y) = exp (cv - 1 + a 2 y 2 ) , 
and f x *{x) and f Y *(x) are re-written as 

/ x (x) = exp (— ao — aix 2 — 1 + c y ) 

1 f 1 o 



2tt 



exp 



2ai 



2«i 



27T— exp{-a - 1 + c Y } , 



f Y (y) = exp (c Y - 1 + a 2 y 2 ) 
1 

= exp 



2tt 



2a 2 



2a 2 



'2vr(-— )exp{ Cy -l}. 



(106) 



(107) 



(108) 



(109) 



(110) 
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Considering the constraints in (11031 ). the Lagrange multipliers in d 109b and (1 1 10b are expressed as 

a = 



1 2 

-1 + c Y + -log27rm x 



1 , mi 
Tjlog^, 

2 m Y 

1 



ft 2 



2ml 



2ml 



1 , 2 
1 — - log27rm y . 



(Ill) 



Therefore, Gaussian density functions f x * and f Y * satisfy the first-order variation condition, 5U = 0. 

Now, the second-order variation condition must be considered, and, for the minimum, it requires the 
positive definiteness of the matrix, 



K'l . 

fx}x 

K'l f 

JYjX 



)x}y 

K'l f 



(112) 



fx— fx* jY—fy* 

The elements of the matrix in (II 121) are calculated as 

f w (y-x) 



K'l f 



K'l f 



K'l f 

fX JY 



K'l f 

JY JY 



fx=fx* >/y=/y* 
fx=fx* >/y=/y* 
fx=fx* i/y=/y* 
fx=f x * Jy=} y * 



fx*(x) ' 

f w (y - x) 

fw{y - x) 
fAx) ' 
fw{y - x)f x *(x) 
fY*(y) 2 



(113) 



and the matrix in (1112b is positive definite. Therefore, 5 2 U > 0, the optimal solutions f x » and f Y - 
minimize the variational problem in (1102b . Even though the optimal solutions are necessarily optimal, 
there are only Gaussian density functions f x * and f Y * in the feasible set, i.e., Gaussian density functions 
f x * and f Y , are the only ones which satisfy the equations in (1106b and (1107b . Therefore, these optimal 
solutions are actually sufficient. 

In conclusion, given the second-order moment, a Gaussian random variable X G minimizes the mutual 
information I(X + W G ; W G ), and the proof is completed. ■ 
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Appendix I 
Proof of Theorem [T2l 

Proof: To prove the inequality in (l35l) . we first construct a functional problem as follows: 
min 

fx 



-J J / x (x)/ r|x (y|x)log (J / x (x)/ r|x (y|x)dx^ dxdy (114) 
+ J J /x(x)/ y , x (y|x) log/ x (x)dxdy 
s. t. ff x (x)dx = l, 
J x/ x (x)<ix = 

y X x T /x(x) ( ix = n x . (115) 

By substituting the random vector Y for X + W G , where X and W G are independent of each other, 
in (|35T ). its density function f Y (y) and conditional density function /y !x (y|x) are expressed as 

fviy) = J / x (x)/ y|x (y|x)dx, (116) 
/ r|x (y|x) = / w (y-x), (117) 

respectively. Therefore, by substituting f Y (y) for / / x (x)/ y , x (y|x)dx and jW(y-x) for / y , x (y|x), and 
appropriately changing the constrains in (11151 ). the variational problem in (II 141 ) can be expressed as 

min / // x (x)/ w (y-x)[-log/ y (y)+log/ x (x)]dxdy (118) 

JxJy J J 

s. t. / / / x (x)/ w (y - x)dxdy = 1, 



x/x(x)/ w (y - *)dxdy = v x , 
xx T / x (x)/„,(y - x)dxdy = fl x , 

J fAy)dy = i, 
y yf Y (y)dy = n Y , 
J yy T f Y (*)dy = ft Y , 

/r(y) = y /x(x)/ w (y-x)dx. (119) 
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The functional problem in (II 181 ) is changed into the following equivalent problem: 



mm 

fx,fy 



J If / x (x)/ w (y-x) 



log /r(y) + log / x (x) + a + ^2 & x i + ^2 ^2 

A(y) 

i=l 3=1 



-fAy) 



i=l j=l 



i=l 



dy, 



r/x 



(120) 



where x T = [x 1 , . . . , x n ], y T = [y 1 , . . . , y n ], and a , a\, Q, Hj, rji, 9ij, and A(y) are Lagrange multipliers. 
Let's define the functional U as 



U[f x J Y 




K(x,y,f x ,f Y )dx.)+K(yJ Y )dy 



where 



^(x,y,/x,/y) = /x(x)/w(y - x)[-log/ y (y) + log/ x (x) + q + ^2&Xi + J2J2lijXiXj - A(y)], 

i=i j=i 



i=l 



^(y,/v) = My) 



i=l 3=1 



+ A(y) 



t=i 



(121) 



Based on the first-order variation condition, we can find the optimal solution, / x * and f Y », as follows. 



fx— fx' Jy—fy* 

fw(y - x) - log / y * (y) + log / x , (x) + a + ^ & 



i=l 



+ J] E "fijXiXj + 1 - A(y) 
i=i 3=1 / 

/ w (y - x) (- log / y . (y) + log f x . (x) + a + Cx T + xTx + 1 - A(y)) 



(122) 



jK' fY d* + K> Y 



fx— fx* Jv—fy 



n n n 

/ x . (x)/ w (y - x)rfx + ai + ^ + ^2Y1 + A (y) 

/5 -* ly ' ) i=l i=l3=l 

y / x .(x)/ w (y - x)dx-^-y + ai + rf y + y T 0y + A(y) 



(123) 
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where 



7n 



Tin 



'11 



7 ln 



~>n\ ' ' ' "nn 



(124) 



Tnl ' ' ' Inn 

C = [&>••• > Cn] T , and rj = [rji, . . . , r] n ] T . 

Since the equalities in (11221 ) and (1123b must be satisfied for any x and y, 

= -log/ y ,(y)-A(y), 

= log/ x ,(x) + a + Cx T + x T rx + l, 

A(y) = 1 - ai - r/ T y - y r 0y, 

and 

f x , (x) = exp (-o-o - C T x - xTx - 1) , 

/ y ,(y) = exp(-l + a 1 + r ? -y + y r 0y). 

Considering the constraints in (II 19b . f x * (x) and / y » (x) in (11261 ) are expressed as 

/ x .(x) = (2^)-t|E x |-5exp|-i(x- / i x ) T S; 1 (x-/x x )|, 

= exp j-i log (27r) ft |E x | - ix^x + ^S^x- i^S-V, 
= exp (— «o — C Tx — x T Tx — 1) , 
/ y .(y) = (2^)-t|E y rie X p|-i(y-^) T S; 1 (y- / x i 

= exp |-i log (27r) n |E y | - ^y T S;V + MyS-V - i^E^Vv} 
= exp (-1 + o-i + r/ T y + y r 0y) , 



(125) 



(126) 



(127) 



where E x = fi x — £t x /x x , E y = E x + E w , and E w is a covariance matrix of W G . Based on the 
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equations in (1 1 27b . 



a i 

r 

C 
© 

v 



-l + -log(27T) n |£, 



l-ll0g(27T) n |E y |-^E-Vy, 



1. 



1-1 



-^-\ (128) 

Therefore, the optimal solutions f x * and f Y * are multi-variate Gaussian density functions (without loss 
of generality, we assume that the covariance matrix E x is invertible due to the reason mentioned in 
Appendix EJ. 

Now, by confirming the second-order variation condition, we will show that the optimal solutions 
f x * and f y , minimize the variational functional in (111 8b - Based on Theorem [2 we will show that the 
following matrix is positive definite: 



K" f 

fx Jx 

K" f 

JY JX 



K'< , 

JXjY 

K'l f 



> 0. 



(129) 



Since the elements of the matrix in d 1 29b are defined as 



K'l f 



JYjY 



K'l f 

JXjY 



K'l f 



fx=fx' i/y=/y 
fx=fx* i/y=/y 
fx=fx* Jv=f Y 
fx=fx* >/y=/y 



fw(y - x) 
Mx) ' 
/x*(x)/ w (y - x) 
/,.(y) 2 
fw(y - x) 
fAy) ' 
/wr(y-x) 



/y(y) 



(130) 



the matrix is a positive definite matrix, and therefore S 2 U > 0. Therefore, the optimal solutions f x * 
and f Y * actually minimize the variational functional in dl 18b - Even though these optimal solutions are 
necessarily optimal, there exists only one solution, which is a multi-variate Gaussian density function, 
which satisfies Euler's equation in (1122b and (1123b . Therefore, f x * and f Y * are also sufficient in this 
case. 

Remark 11: The constraints related to the mean vectors in dl 19b are unnecessary. Without these 
constraints, the optimal solutions are still multi-variate Gaussian density functions but the mean vectors 
are changed into zero. 
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Appendix J 
Proof of Theorem [T3l 

Proof: To prove the entropy power inequality, we slightly change the inequality in (l36l ) into the 
following relationship: 

h(X + W)> a 2 x h{X) + a 2 w h(W) - log a x - log a w , (131) 

where X = a x X and X = a w W . Since a x and a w are constants, they do not affect the optimization, 
and we can ignore these two terms. 

Based on the inequality in (11311 ) and required constraints, construct the following functional problem 
(for the simplicity of the notation, we simply denote X and W as X and W): 

, min , / / fx (x)f w (y - x) (- log j~ y (y) + a\ log f x (x) + a 2 w log f w (y - x)) dxdy (132) 

JX,JW,jY J J 

s.t. I [ f x (x)f w (y-x)dxdy = l, 

V 2 fx{x)f w (y - x) dxdy = m 2 Y , , 

2 f („\ f t„, „\ j ;„. _ _2 



x fx(x)f w (y - x)dxdy = m x ,, 
(y ~ x) 2 f x {x)f w {y - x)dxdy = m 2 w ,, 
fx{x)f w {y - x) log f x {x)dxdy = p x , 
fx{x)f w {y - x) log f w (y - x)dxdy = p w , 
fy(y) = / fx(x)f w (y - x)dx, (133) 



where m 2 x ,, m 2 w ,, and m 2 ,, denote the second-order moments of the optimal solutions of X, W, and Y, 
respectively. The constraints related to the second-order moments mean that all random variables have 
finite second-order moments. Also, the constraints related to p x and p w mean that random variables X 
and W have finite entropies, respectively, where p x and p w are constants. Without loss of generality, the 
zero mean condition is assumed for all random variables (in the case of non-zero mean, all constraints 
related to the second-order moments are changed into constraints related to the covariance matrices). 
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Using Lagrange multipliers, the problem in (11321 ) and the constraints in (11331 ) are reformulated as the 
following equivalent problem: 



mm 

JX,fw,JY 




K(x,y,f x ,f w ,f Y )dx) +k(y,f Y )dy, 



(134) 



where 



K(x,y,f x ,f w ,f Y ) = f x (x)f w (y-x)( -log f Y (y) + (a 2 x -X x ) log f x (x) 

+i a w ~ <M log f w (y-x) + a + aiy 2 + a 2 x 2 + a 3 (y- x) 2 - X(yj) , 
K(yJ y ) = X(y)f Y (y). (135) 
The first-order partial derivative is expressed as 



fx 



fx— fx* Jw—fw* tfy—jY" 



fw* (y-x)(- log f Y * (y) + (a\ - X x ) log f x * (x) + (a 2 w - A w ) log f w , (y - x) + a + a x y 2 
+a 2 x 2 + as (y - x) 2 - \(y) + a 2 x - X x ) , 



K' f 



fx— fx* Jw—fw* ,jY—fy 



/ x » (x) ( - log f Y * (y) + (a 2 x - X x ) log f x * (x) + (a 2 w - X w ) log f w * (y - x) + a + aiy 2 



+a 2 x 2 + a 3 (y- x) 2 - X(y) + a 2 w - X v 
J Kdx + K^j 



fx— fx* Jw—fw* >/y— /y 



fx* (x)f w * (y - x) dx + X(y). 



(136) 



U*(y) 

Due to the first-order variation condition, 5U[f x » , f w , , f Y *] = 0, the optimal solutions f x ,, f w ,, and 
f Y +, must satisfy the following relationships: 

- log/r«(y) + aiy 2 - X(y) + c Y = 0, 

(a 2 x - A x )log/ x »(x) +a 2 x 2 + c x = 0, 

(a 2 w - X w ) log f w , (y - x) + q 3 (y - x) 2 + a + a 2 w - X w - c x - c Y = 0, 

-1 + A(y) = 0, 

a 2 w - X w -a 2 x + X x = 0, (137) 
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and therefore, 



fy(y) = expjaiy 2 - A(y) + c Y ) 



1 



f x * (x) = exp <j — — (-a 2 x 2 - c x , ( . 



a\ - X x 
1 



«2 _ \ 

14 w w 



/w* (y - x) = exp <j — — ( -a 3 (y - x) - q - a w + X w + c x + c, 



A(y) = 1. (138) 
Considering the constraints in (11331 ). the equations in (11381 ) are expressed as 



fy(y) = i 1 = ex p I — / 1 1 \ y 2 ( \ 27T \ ) exp i~ A (^) + 



2qi 

1 f I 



exp > , 

V / 2vrm 2 7 I 2m^, 



,2 



/x* 0) = , 1 = exp I 1 x 2 } W 2tt ( " x ) exp 



a 2 x -A x \ I 2 ' a *~ Ax ' I V V 2a 2 / 1 at - A, 



1 J 1 2 

: exp < — - — x 



V27rm|7 I 2m 



1 



/w* (y ~ x) = . = exp <( — (y - x) 



2 



x , / 2tt l^-M exp f " ao " a - + K + Cx + ^ 



2a 3 y [ a 2 ,, - A w 



1 r 1 n2 



exp -__(y-x)' , (139) 
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where 



«2 

Q' 3 



-(a 2 w - X w ) + c x + c Y + 



1 



log 2nm l 



2ml, 
2m 2 x , ' 

n 2 - \ 

2ml, ' 



al - A, 



log 2ixm > 



(140) 
(141) 



m\, 



ml,, 



1 - 2 log 27rm Y- > 
«d - A w > 1, 



1 

2^ 
1 

2~^ 



exp{2p x } , 
exp{2p w }, 



2 , 2 

1 1 

- — exp {2p x } + - — exp {2p w } . 
lire 2ne 



(142) 



The inequality in (11421 ) is due to the second-order variation condition, which will be justified next. 
Consider now the conditions for the second variation of the functional problem: 



K'l f 

Jx Jx 



K'l f 

JW Jw 



Kdx + K 



fyfy 



K'l f 

Jxjn 

K'l f 

Jwjx 



K'l f 

JX JY 



K'l f 

JyJx 



K'l f 



K'l f 

JY Jw 



fx— fx* Jw—fw* i/y— /r* 

fx=fx* ,fw=Jw* Jy=}y" 
fx=fx* ,fw=fw* Jy=} y * 

fx=fx* >fw=fw* >/y = /r* 

fx=fx* >fw=fw* Jy=Jy* 
fx=fx* Jw=fw* Jy=Jy* 
fx=fx* >fw=fw* tjY=jY" 
fx=fx* ,fw=fw* >/y=/v* 
fx=fx* ,fw=fw* Jy=Jy' 



(a 2 x - \ x )fw (y - x) 
fAx) 

{a 2 w - \ w )fx*{x) 

fw (y-x) 

fx*(x)f w , (y - x) 

/y(y) 2 

n 2 — \ 



a x — A x , 



fw (y ~ x) 

Mv) ' 
fw (y - x) 
fAv) ' 

fx*{x) 

fAvY 
fAx) 

fAvY 



(143) 
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To satisfy 5 2 J > 0, the following condition must hold: 



I£ p r fcC £ e 1^. r r 

JXjX JXjW JXjY 

r p r r l^C £ r 

JWjX JWJW JWJY 

1^ £ £ -/^ £ £ £ £ 

JY Jx Jy Jw Jy Jy 





h x 




h 




h Y 



fx— fx* Jw—fw* Jy — fy* 



= W + K f w M h l + K %fX + ( K lf w + K 'Lf x )h x h w 

HK% vfY +Kl fw )h w h Y + {K'} xfY+ Kl fx )h Y h x \ (144) 

IX— Jx* ,JW~Jw* >JY — JY* 

> o. 

Using the defined quantities in (11431 ). the equation in (| 144b is expressed as follows: 

K 'L fx* h l + K 'L /w h l + K Ur* h l 

+( K Lfw*+ K l*fx*) h ^ + ( K L*fY* +K^ fwt )h w h Y + (K^ fy ,+Kl, fxt )h Y h x 

hwiy _ x f + fAx l fw ' { y- x) h Y {yf 



{a 2 x - \ x )fw (y -j^ h ^2 + ( a w - K)fx»(x) 



fx*(x) 



f w * {y - x) 



fAvf 



+2(a 2 w - X w )h x (x)h w (y - x) - 2 ^ [ X ] h lv (y - x)h Y {y) - 2 ^ w } V X \ x (x)h Y (y) 

jY*(y) jY*{y) 



fw ' {y x) ' (4 - K v )h x {x? + (al - x w ) J Ax) \ 2 h w {y - xf + fAx? 



fx*(x) 



fw {y-xY 

+2(a 2 w - X w ) - fx ^ X) - h x {x)h w {y - x) 



Mv) s 



K{vf 



-2 



fw* (y - x) 
fx*(x) 2 



, , . ,. , ^h w (y - x)h Y {y) - 2 ^ x ^ X \ h x (x)h Y {y) 
fw* {y ~ x) f Y , (y) / r . (y) 



fw* (y - x) 
fx*{x) 



h x (x) + 



fx*(x) 



f w , (y - x) 



h w (y - x) 



fx*(x) 

My) 



K{y) 



> o, 



(145) 



where a\, - \ w = a 2 - X x > 1. 



Therefore, the optimal solutions, f x *, fw*, and f Y *, minimize the variational problem in (1132b . Even 
though fx*, f w *, an d f Y * are necessarily optimal, they are sufficiently optimal since only Gaussian 
density functions are in the feasible constraints set. ■ 
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Appendix K 
Proof of Theorem [T4l 

Proof: Similar to the proof shown in Appendix|JJ we first construct the following functional problem: 

, mil \ / / fx (x)/ w (y - x) (- log f Y (y) + a\ log f x (x) + a 2 w log f w (y - x)) dxdy 

JX,JW,jY J J 

s.t. / / f x (x)f w (y - x) dxdy = 1, 



yy T /x(x)/„- (y - x) dxdy = n x , + n w », 

xx T f x (x)f w (y - x) dxdy - O x .. 
(y - x) (y - x) T f x (x)f w (y - x) dxdy = tl w *, 

II ^ x ^ w *- y ~ X ' ) log f x ( x "> dxd y = Px-> 

- J J fx (x)/ w (y - x) log f w (y - x) dxdy = p w , 

fv(y) = J fx (x)/ w (y - x) dx, (146) 

where p x and p w are constants, and the constraints related to these constants mean the entropies of X 
and W are finite. The matrices Q x * and f2 w » denote the correlation matrices of the optimal random 
vectors X* and W*, respectively. The constraints related to these correlation matrices mean that the 
correlation matrices of random vectors X and W exist. Without loss of generality, the mean vectors of 
X and W are assumed to be zero (If X and W have non-zero mean vectors, the constraints related to 
the correlation matrices are changed into the ones related to the covariance matrices.). 

Using Lagrange multipliers, the problem in (11461 ) is changed into the following optimization problem: 

min^ J (J K(x,y,f x ,f w J Y )dx S j+K(yJ Y )dy, 



fx,fw,fy 

where 

2 



(147) 



K(x,y,f x ,f w ,f Y ) = /x(x)/ w (y - x) - log / y (y) + {a l x - X x ) log f x {x) 

n n 

+{a 2 w - \ w ) log f w (y -x) + a + HiViVj 

i=i j=i 

n n n n 

i=l j=l i=l j=l 

K(yJ Y ) = A(y)/y(y). (148) 
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Then, 

K fx = /»' (y - x ) ( - lo § A-(y) + ( a x - A *) lo g /*( x ) + ( a w - Aw) log / w (y - x) + a 

n n n n n n 

+ Y Y HoViVj + Y Y <^i x * x j + Y Y 9l i ( yi ~ ( y 3 ~ x i) ~ + a * ~ Xx ) > 



i=i i=i i=i j=i i=l i=i 

K} w = / x (x) ( - log / y (y) + (a 2 x - X x ) log / x (x) + (a 2 w - X w ) log f w (y - x) + a 



n n 

+ ' 



Y Y ^ijViyj + Y Y ^3 x ^ x 3 + Y Y 9i i ( yi ~ x ^ ( y j ~ x ^ ~ A ( y ) + a ™ ~ Xw ) ' 

i=l j=l i=l j=l i=l j=l 

J Kdx + K^j =- J f x (x)f w (y - x) d*y^ + A(y). (149) 
To satisfy dU[f x ,,f w ,,fy} = 0, 

n n 

- log /y (y) + Y Y W ~ A (y) + C v = °> 

i=l i=l 

n n 

(a 2 - X x ) log / x . (x) + Y Y ^ii x i x i + Cx = 0, 

i=l j=l 

n n 

(a 2 w - X w ) log fw (y - x) + Y Y e v ( yi ~ x '^ ( y i ~ x ^ + a + a w ~ ^ ~ c x - c Y = 0, 

i=l j=l 

-1 + A(y) =0, 

a 2 w -X w -a 2 x + X x = 0. (150) 

Since the equations in (11501 ) must be satisfied for any x and y, the optimal solutions f x ,, f w *, and fw 
are expressed as 

{n n 
Y Y ^j yiy j - A (y) + c > 
i=l j=l 

= exp{y T ry - 1 + c Y } , 



f x * (x) = exp < 



a x ~ Ax 



n n 

EE' 

i=l j=l 



~>ijXiXj c x 



= expj — 1 (x r $x + c x )\ , 
{ < ~ Ax J 

/ w « (y - x) = exp \ 1 [ - Y] S~] (W - (yj - Xj) - a - a 2 w + X w + c x + c y ) > 

= exp | - ^ 2 ^ - f (y - x) T (y - x) + a + a 2 w - X w - c x - c y J j 
A(y) = 1. (151) 
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Considering the constraints in (11461 ). the equations in (11511 ) are further processed as 



My) 



l 



(27T) 



(2vr) 2 



.ir- 1 



-exp < --y 



2 



y } (2tt) 



2 



exp{-A(y) + c y } 



— " rexp i -iy T (fi x . + to w *) 1 y 1 



1 



(2vr)t 
x(2vr) 



2 

a?. - A 



- exp < — -x 



-1 



X 



X * 1 



Cv 



1 



/„,« (y - x) 



1 



■ exp 



exp 



x T n-, 1 x 



ii - A, 



(27T) 



Aj\ Q-l 



x ex PS -o (y- x ) 



n 2 X 



(y-x) 



x(2vr)" 
1 



— A w 



» ©-i 



exp 



-a — a 2 ,, + A w + c x + c Y 
a 2 - A 



(2vr)2 |O w .|* 



r exp { -- (y - x) T fi w , (y - x) \ , 



(152) 



where 



a 

r 



-(a 2 w - X w ) + c x + c Y + aw - A "' log ((2tt)" |O w . 



ft: 



at - A, 



log ((271-)" ia 



(153) 
(154) 



|n x .| 

12 w * 



l--l0g((27T) n |n x .+fi w ,|) 

a 2 - A x > 1, 

(J- 

V27re 

(— 

V27re 



exp <^ -p y 

n 



exp < — p 

n 



(155) 
(156) 

(157) 
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Without loss of generality, the matrices fi x » and fl w » are assumed to be invertible due to the same 
reasons mentioned in Appendix |B] The relationships in (11551 ) are obtained based on the second-order 
variation condition, which will be shown later in this proof. 
Therefore, we can always find the Lagrange multipliers. 
Now, consider the conditions for the second-order variation condition: 

(al ~ A x )/„- (y-x) 



K'l f 

Jx Jx 



K'l f 

Jw Jw 



Kdx + K 



/y/y 
K'l f 

JXjK 



K'l f 

Jw Jx 



K'l f 

Jx Jy 



K'l f 

JY Jx 



K'l f 

Jw Jy 



K'l f 

JYjW 



fx=fx* >fw=fw* Jy=Jy* 
fx=fx* Jw=fw* Jy=Jy' 

fx=fx* Jw=fw* i/y=/y* 

fx=fx* Jw=fw* i/y=/y* 
fx=fx* Jw=fw* >/y=/y* 

fx=fx* Jw=fw* i/y=/y* 

fx=fx* Jw=fw* >/y=/y* 

fx=f x * ,fw=fw* ,jV=/y* 

fx=fx* Jw=fw* >jV = /y* 



(gi ~ A H ,)/ X >(x) 
f w . (y-x) 

fx*(*)f w * (y-x) 
f Y *(y) 2 

2 — x 

O'w > 
2 \ 

a x — a x , 
f w . (y - x) 

/y.(y) ' 

f w , (y - x) 



/y(y) 
■Mx) 
/y(y)' 

/x»(x) 

My)' 



To satisfy 5 2 U[f x , , / w . , f Y *] > 0, the following must hold: 



h x h w h 



Iff iff 


K" 

fx* fY* 








if" if" 

Jw * fx * fw * Jw * 


K" 

fw* Jy* 




h 

ll w 




if" if" 

jV * fx * fY* fw * 


*J>/y* . 




hy 




, . hi + . f , h 2 v 

Jw* w Jy* Jy* y 


+ (*3ww 


+ Kj w , fx .)h x h w 



+ ( K fw*fY* + K l*fw*) h ^ + ( K LfY*+ K fY*fx*) h ^ 

> 0. 



(158) 



(159) 
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Using the denned quantities in (11581 ). the equation in d 1 59b is expressed as follows: 

= {al ~ A ; }/ ;-; (y = x) mx) 2 + {al = K)Mx) h w (y - x) 2 + fx ^] u ; g " x) My) 2 

+2 (a 2 w -A,,.) Mx)My _ x) _ 2 f*^l hw{y _ x) ^ (y) _ 2 / w » (y -x) ^ 

«w /y (y) fv (y) 

= — (a w - A^)/t x (x) + «-A w )- — _ft w (y_ x ) + , , x 2 My) 

/a- (Xj ^ f w , (y - x) /s- (y) 

+2(a 2 w - X w ) /x : (x) r h x (x)h w (y - x) 
/ w . (y - x) 

A (X) " ;hw(y-x)fcy(y)-2^x(x)A,(y) | 



A,*(y-x)/ r -(y) ' ^ / y .(y) 

. /»-(y-x) / /x»(x) , . /x*(x) \ 2 

> — , 7 v M x ) + 7 — 7 r/i w -(y - x - . My 

/x*(x) V /^-(y-x) /v(y) / 

> 0, (160) 

where a 2 v - A w = a 2 - A x > 1. 

Therefore, the optimal solutions, f x *, and minimize the variational problem in ( I146I ). Even 
though / x ., / w », and / y » are necessarily minimum solutions, multi-variate Gaussian density functions are 
the only ones in the feasible set. However, unlike Theorem [I3j the correlation matrices are not explicitly 
defined as shown in (11561 ) and d 1 57b . and there are more than one Gaussian density functions which 
satisfy the first-order and the second-order variation conditions. Therefore, we need an additional step to 
determine the correlation matrices Q x * and il w * as follows. 

Based on the first-order and the second-order variation conditions, we know the optimal solutions of the 
functional problem in (11461 ) are multi-variate Gaussian density functions f x * and f w * whose correlation 
matrices are fl x » and il w », respectively. Therefore, the inequality in 071 ) is expressed as 

h(a x X + a w W) - a 2 h(X) - a 2 w h(W) 

> h(a x X* + a w W*) - a 2 x h(X.*) - a 2 ,/i(W*) 

= ~ log (2vre) n |a 2 Sl x . + a 2 ,fi„., | - ^ log (2vre)" |n x . | - ^ log (2vre)" \fl w . \ 

> 0. (161) 
Since log | • | is a concave function and a 2 + a 2 , = 1, the inequality in (11611) is proved using Jensen's 
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inequality. Therefore, 



h(a x X + a w W) > a 2 x h(X) + e&/»(W) 



(162) 



and the proof is completed. 

Remark 12: In (11611) . equality holds if and only if Q x * = f2 w *. Since the optimal multi-variate 
Gaussian density functions have zero mean vectors, in this case, the correlation matrices are equal to the 
covariance matrices. Therefore, the equality condition requires identical covariance matrices. However, 
the equality condition is not required in the proof of EPI. 

■ 

Appendix L 
Proof of Theorem [131 

Proof: Now, construct the following variational problem, which represents the inequality in (I38T ) and 
required constraints, as follows: 



fx, fir 



mm 




fx(x)f w (y ~ x) (-filogf Y (y) + \ogf x (x) + (J,(fi- 1) log f w (y - x)) dxdy (163) 



s.t. 





Mr) 2 fx(x)f w {y 




x)dxdy = 



x)dxdy 



a: 




.2 

y* > 



(x - fi x ) 2 f x (x)f w (y - x)dxdy 





(164) 



where p and r are constants, and a 2 ,* stands for the variance of the optimal solution Y. 



Using Lagrange multipliers, the functional problem in (11631 ) is expressed as 




(165) 
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where 

K(x,y,f x ,f Y ) = f x (x)fw(y - Atlog/ Y (y) + log/ x (x) + //(// - l)log/ w (y - x) + a 

+/3i (y - /v) 2 + p 2 {y- Vy) 2 - (x - fj> x ) 2 - fa {y - x - ii w ) 2 + /3 3 (x - Hxf 
-7i log/x(a;) - Kv))i 

K(y,f Y ) = \{y)U{y). (166) 
Due to the first-order variation condition, 

fx— fx" Jy—Sy* 

= fw{y ~x)( - fi\ogf Y ,(y) + log/ x .(x) +//(//- l)log/ w (y - x) + a 
+/3i (y - ^ y ) 2 + h{y- ii Y ) 2 - h (x - nx) 2 - fa {y - x - nw) 2 
+/3 3 (x - fi x ) 2 - 7i log f x , (x) - X(y) + 1 - 71) 

= 0, (167) 

j ^ Y ~ f Y f x= f x *j Y= f Y „ 
I fx*(x)f w (y - x)dx 



= -V 
= 0. 



+ A(y) 



(168) 
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Since the equations in (11671 ) and (1168b must be satisfied for any x and y, 

Kv) = 

fr'(y) = exp ji ((ft +ft)(y-^.) 2 + ev) 



: exp 



2tt 



"2(ft+/3 2 ) 



2 ! ' 2(/3 1 % 2 ) 



(y- fr')' 



2 (A + ft) 



exp <^ — 



fw(y~x) = exp 



(y-x-/x w ) 2 



mO*- i) 
1 



i7F 1 -^sr 



. exp < 



1 



9 / MM- 1 ) 
- 1 2(/3 2 ) 



(y-x- /i w y 



x W 2tt 
/ x . (x) = exp 



2 (ft) 



exp 



1) 



1 

(ft - ft) (a; - fi x *) - a + H - 1 + 71 + c w + < 



1 - 71 
1 



2;T J 2(J 2 -ft)) 



: exp 



1— 71 



(a; - /u x ») z 



1 - 71 



2 (ft - ft) 



exp 



-a + \x — 1 + 71 + c w + c Y 
1 - 71 
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Considering the constraints in (1 1 64b . the equations in d 1 69b are further processed as follows: 

ill 

fv{y) = ,== exp <{ ? -(y-^ Y ,y 

2(/3i+A) 



27t(- ,/^o S ) i 2 1 mm 



i r i , , 2 



- x) = . 1 = exp \ -. — - — —r- (y-x- n w ) 

f^mf) I 2 ("*#) 

x^vrf-^-^exp 



2(/? 2 ) ; ^(/i-i) 



V^l- I 2 ^- 



/ x .(x) = , 1 = exp ^ _( x -/z x .) 



2 



XW27TI- /" 71 ^/-"Q + M-l + Tl + ^ + Cv 



2 I 1-71 

:ex P ( x ~ Vx') 2 \ > ( 17 °) 
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where 



Of) 



H - (1 - 71 ) + c w + c Y + log (2™*.) 

^tZ^i bg (2vrm^) - | log (2*7i&) + f log (2*^ 

_/32 "2^: 

m(m-i) 



/i 



02 



2<4 



2ai, 



1) 



/?2 + 



2<4 ' 

(l-7i) 
2al, 



+ (l-7l) 



2^ 
> 0, 



(171) 



log (27rcr 



(7 



1 . . 2 

— exp{2p} <r , 

2 , 2 



(172) 



(173) 



7i < 1 - A*- 

The constant j> must be chosen to satisfy the inequality in (11721 ) due to Theorem 01 The inequality in 
(11731 ) is due to the second-order variation condition, which will be presented later in this proof. Therefore, 
by appropriately choosing p, the Lagrange multipliers always exist, and therefore, the necessary optimal 
solutions, which are Gaussian, exist. 

To make the second variation positive, we need the positive-definiteness of the following matrix: 



K" f 

fx fx 

K" . 

JYJX 



K" f 

JXjY 

K" f 



(174) 



fx— fx* Jy—}y* 



and it requires the following: 

h x h y 



K" f 

fx Jx 

K" , 

fy fx 



K" f 

fxjY 

K" f 

fYjY 



= K "fxfxK + K'UK + {K" Uh + K% fx )hyh 3 

> o, 



fx— fx* >/y— /y* 

fx=fx* jY=f Y 



(175) 
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where h x and h Y are arbitrary admissible functions. 

Since K'l f , K'l f , K'l f , and K'l f are denned as 

IX jX JxjY jYjX JYJY 

K „ (t-n)fw{y - x) 

fxfx fx(x) 

tsii ^f w (y-x) 
K f — 



L fxfY 



My) 



K „ _ nf w (y-x) 

hfx My) ' 

u-n vfx(x)U(y - x) 

Kfrfr = Jjtf > ( 176 ) 

the equation in (1175b requires the following: 

(i-7i)/w^-x) a _ 2 ^y^ hx(x)hAy) + ^f-iy-x) 
fx'{x) M{y) M\vY 

> ^^(h^)-¥^h Y (y)]\ (177) 



fx*{x) V M{y) 

where 71 < 1 — \i. Similar to the complementary slackness in KKT conditions, when ^3 = in d 1 7 1 b - 
a\, = (1 — 71) Ai _1 (Ai - l) -1 cr^„,, and it requires (1 — 71) n~ l (n — l)~ l o^ v , < r 2 (If 71 = 1 — fj,, then 
a\, = (p- Otherwise, a\, = r 2 < (1 - 7l ) ^(p - l)" 1 ^. 

In conclusion, the Gaussian density function, whose variance is a 2 ,., minimizes the variational problem 
in d 1 63b . and the proof is completed. 

Remark 13: Unlike other theorems shown in this paper, Theorem [15] only requires to find necessarily 
optimal solutions, a result similar to Theorem 8 in |f2l. 
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Appendix M 
Proof of Theorem [T6l 

Proof: We first construct the following variational problem (without loss of generality, we assume 
the mean vectors of X, W, and Y are zeros, (cf. Appendix |Q): 



mm / / / x (x)/ w (y-x)(-/ilog/ y (y)+log/ x (x)+/x( j u-l)log/ w (y-x))dxdy (178) 

JX JY 



s.t. J J f x (x)f w (y -x)dxdy = 1, 

J J yy T /x(x)/ H ,(y - x)dxdy = J J xx T / x (x)/ H ,(y - x)dxdy, 
+ j /( y-x ) (y- x ) T /x(x)/ w (y-x)dx(iy, 
xx7 x (x)/,,.(y - x)dxdy ^ S, 
yy T /x(x)/ w (y - x)dxdy = 
- / y /* (x) / w (y - x) log / x (x) dxdy = p x , 

/r(y) = y J f x (x)f w (y-x)dxdy, (179) 

where p x is a constant, and S r . is the covariance matrix of the optimal solution of Y. Without loss of 
generality, the matrix S is assumed to be a positive definite matrix due to the same reason mentioned 
in IS. 

This problem is more appropriately changed as follows: 

mm / / / x (x)/ w (y-x)(-/xlog/ y (y) + log/ x (x) + /x(/x-l)log/ w (y-x))dxdy (180) 

JX,JY J J 

s.t. / f f x (x)f w (y - x)dxdy = 1, 

yiVj - XiXj - (y - x) i (y - x)A f x (x)f w (y - x)dxdy = 0, 



n n , „ „ \ n n 

^2^2 [ / x i x jdCjfx(x)f w (y - x)dxdy J < ^ ^ afj^j, 

8=1 j=l ^ ' i=l j=l 

y y yiVjfx(x)fw(y - x)dxdy = a 2 Y ,, 

- J J fx (x)/ w (y - x) log / x (x)dxdy = p x , 

/v(y) = y f f x (x)f w (y-x)dxdy, (181) 
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where the arbitrary deterministic non-zero vector £ is defined as [£i, . . . , £n] T > o Y * denotes the i th row 

ij 

and j th column element of S y . , i = 1, . . . , n, and j = 1, . . . , n. 

Using Lagrange multipliers, the functional problem in (11801 ) and the constraints in (11811 ) are expressed 

as 



mm 

fxjv 



J (J K{x,yJ x J Y )dx\ + K(yJ Y )dy, 



(182) 



where 



K(x,y,f x ,f Y ) = /x(x)/ w (y - x)(^ - |Ulog/ y (y) + log/ x (x) + /i (/i - l)log/ w (y - x) + a 

ra n 

{lijViVj - Hj x i x j ~ 7ij (V - x )i (V - x )j + ^ x i x jCiCj + fajViVj 

-ailog/ x (x) - A(y)J, 



i=i j=i 



K{yJ Y ) = A(y)/ y (y). 
Then, the first-order variation condition is checked as follows. 



(183) 



K> 



fx 



fx—fx* Jv — fy 



fw(y - x)^ - /ilog/ r .(y) + (1 - ai)log/ x *(x) 

n n 

+H (n - 1) log f w (y - x) + a + ^ ^ ( 

i=l j=l 

-Hi (y - x )i (y -x)j + + fajya/j + ) - Ky) + 1 - «i 



fy—fx* ifv—fy 



Vffx(x)fw(y - x)dx 

My) 



+ A(y) 



0. 



(184) 



(185) 
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Since the equalities in (11841 ) and (1185b must be satisfied for any x and y, 

A(y) = ft, 
/ y .(y) = exp\-(y T (T + &)y + c Y ) 



/*•(*) 



where 



(2vrrt|-|(r+$r i 

x(2vr)t|-|(r + *)- 1 



exp<i --y 



-1 



exp <^ — 
A* 



f w (y-x) = exp 



mO*-i) 



((y - x)* r (y - x) - cv) 



(27T)-- 

x (2vr)' 
exp 



lij/j- l) r _i 



l) r -i 



exp 



exp 



(y-x) 



(y-x) 



A* (A* - 1) J ' 



1 



(2vr)" 
x (2vr) 



1 — ai 

1 — ai 



x T (r — 0S) x — ao + ^i - l + ai + c w + c Y 



2 

1 — a\ 



(r-^H)- 1 
(r - 0H)- ] 



exp --x 



1 — a\ 



(T-esy 1 ) x 



-i 



exp 



-ao + /x — 1 + ai + c w - + c y 
1 — ai 



(186) 



01n 



<Pnl • • • (Pnn 
X — [iCl , ' ' ' , 3?n] > 

y = [2/1, - - - ,yn] T , 
> 0. 



711 • • • 7lr. 



Tnl • • • In 



ClCn 



(187) 
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Considering the constraints in (1181b . the equations in (11861 ) are further processed as follows. 



f Y ,(y) = (2vr)-t|-|(r + *r 1 



cxp 



x(2vr)? -^(r + ^r 1 



exp^ — 



fw(y-x) = (2tt)- 



/i 



-i 



exp< --(y-x) 



(y 



x (2tt) ■■ 



H{ji- i) r _i 



cxp 



(27r)-f |E w |-f cxp < -- (y - xf E" 1 (y - x} 



x (2tt) ' 



1 — CHl 

2 

1 — ai 



(r-es) 



(r-es)- 1 



exp --x 



cxp 



-a + fJ- - I + ai + c w + c Y 
1 — ai 



(2vr)-t|S x ,|-texp<|--x r S- 1 x|>, 
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where 

r 



/J, — (1 — «l) + c w + C Y + 



tog(27T) n |E Jt .| 



fj, — (1 — OL\) + f^- 1] log (27T)" |E W | - £ bg (27T)" IS, 



+ i_^llog(27r) n |E, 



2 



1 — «i 
2 

1 — «1 



(r-us) 



b o, 

> o, 

Ol < 1 — JU, 

a*(m-i) 



(189) 
(190) 

(191) 



log (27T) n |S, 



2 

V27re 



log (2vr) 



Ey* | , 



exp < — p 

n 



(192) 



The inequality in (11901 ) is always satisfied since the matrix S is non-zero positive semi-definite and 9 is 
non-negative. The inequality in d 192b will be proved later in this proof. The constant p x must be chosen 
to satisfy the inequality in dl92| ). Then, the Lagrange multipliers always exist, and necessary optimal 
solutions exist. 

Interestingly, similar to the complementary slackness in KKT conditions, when 9 = in (11891 ), = 
(1 — a\) — l)~ l T, w , and it requires (1 — a\) — l) _1 5] v ,, < S. When 9 is non-zero, the 

equation in d 1 89b is positive semi-definite, and it means X^. = (1 — a\) yT 1 (/x — S^, where 
S» = S w — S^r, where and are positive semi-definite matrices. When 1 — a.\ = ll, then 
= (ll — S, iM which is exactly the same as the one in [2] and (27]. 

To make the second variation positive, we need the positive-definiteness of the following matrix: 



K" 

fx* fx* 

K" 

fy* fx* 



K" 

fx* fy 

K" 

n fv*fy 



(193) 
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and it requires the following condition to hold: 



h x h Y 



K" K" 

fx* fx* fx*fy* 

K" K" 

fy*fx* fy*fy* 





h x 




fly 




+ K fy 



> o, 

where h x and h Y are arbitrary admissible functions. 

Since K" t , , K" f f , K" f f , and K'l f are defined as 

Jx*Jx* Jx*Jy* Jy*Jx* Jy*Jy* 

(i - Qi)/w(y - x) 

/x-(x) 

M/w(y-x) 

My) ' 
M/w(y-x) 

My) ' 

M/x*(x)/ w (y - x) 



(194) 



fx* fx* 



^ 'fx* fv* 



fy* fx 



K fy*fY* 



My) 



(195) 



the equation in (1194b requires 

(1 - ai)/w(y - x) , , 2 nf w (y-x) 



> 



M*; 

M/w(y - x) 



Mx) z -2 
M*) 

My) 



My) 

2 

My)' 



h x (x)hy(y) H fty(y) 



My) 2 



(196) 



M*) 
where ai > 1 — //. 

Therefore, the optimal solutions / x . and f Y * minimize the functional problem in (11801 ). and the proof 
is completed. ■ 

Appendix N 
Proof of Theorem [171 

Proof: First, choose a Gaussian random vector W G whose covariance matrix ~E iV satisfies S lV X S w 
and ^ S v . Since the Gaussian random vectors V G and W G can be represented as the summation of 
two independent random vectors W G and V G , and the summation of two independent random vectors 
W G and W G , respectively, the left-hand side of the equation in d40l is written as follows: 

/i/i(X + V G )-/i(X + W G ) 
> /i/i(X + V G )-/i(X + W G )-/i(W G ) + /i(W G ) 

= /i/i(X + W G + V G )-/i(X + W G )-/i(W G + W G ) + /i(W G ). (197) 
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Since the equation will be minimized over / x (x), the last two terms in (11971 ) are ignored, and by 
substituting Y and X for X + W G + V G and X + W G , respectively, the inequality in (l40l) is equivalently 
expressed as the following variational problem: 

min fih(Y)-h(X.)-fi(fi-l)h(y a ) 



s. t. / / / x (x)jV(y-x)dxdy-l = 0, 

/*(x)/o(y - x)xx T dxdy - Si ^ 0, 
/*(x)/*(y - x)yy T dxdy - E y . = 0, 

/*(x)/fr(y - x) (yy T - xx T - (y -x) (y - x) T ) cixdy = 0, 
- J J /*(x)j>(y - x) log/ x (x)dxdy = p X) (198) 

/v(y)= / /^(x)/v(y-x)dx, 

where X = X + W G , Y = X + V G , W G = W G + W G , V G = W G + V G , E x = E + E*. 
S y « = E x » + E v , and E x » is the covariance matrix of the optimal solution X*. 

The variational problem in (11981 ) is exactly the same as the one in (1 1 80b - Therefore, using the same 
method as in the proof of Theorem [161 we obtain the following inequality (see the details of the proof 
in Appendix IMl): 

Hh(X + W G + V G ) - h(X + W G ) - h(W G + W G ) + h(W G ) 

> nh(X* G + W G + V G ) - /i(X* +W G )-/ l (W G + W G ) + / i (W G ). (199) 

By appropriately choosing X* and W G , the right-hand side of the equation in (11991 ) is expressed as 

/ifc(X* + w G + V G ) - h(X* G + W G ) - h(W G + W G ) + h(W G ) 

= fih(X* G + W G + V a )-h{X* a + W G ). (200) 

The equality in (12001 ) is due to the equality condition of data processing inequality in [27]. For the 
completeness of the proof, we introduce a technique, which is slightly different from the one in ll27l . 

To satisfy the equality in the equation (12001 ). the equality condition in the following lemma must be 
satisfied. 

Lemma 1 (Data Processing Inequality /UJ/J: When three random vectors Yi, Y2, and Y3 represent a 
Markov chain Yi — > Y2 — > Y3, the following inequality is satisfied: 

/(Y i; Y 3 ) </(Y i; Y 2 ). (201) 
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The equality holds if and only if I(Yi; Y2IY3) = 0. 

In Lemma [Q Yi, Y 2 , and Y 3 are defined as X*, X* + W G , and X* + W G + W G , respectively. 
Therefore, the equality condition, I(Yi; Y 2 |Y 3 ) = is expressed as 



If (E x * + E w )~ E x » = (E x * + S,v)~ E x «, the equality in (12021 ) is satisfied, the equality condition 
in Lemma Q] holds, and therefore, the equality in (12001 ) is proved. The validity of (E x * + S K ) _1 E x » = 
(E x * + E,^) -1 E x » is proved by Lemma 8 in j27l . 

Therefore, J(Yi;Y 2 |Y 3 ) = 0, and, from the equations in (11971 ), (11991 ), and (12001 ). we obtain the 
following extremal entropy inequality; 



fih(X + V G ) - fc(X + W G ) 

> ^(X + V G )-/i(X + W G )-/i(W G ) + /i(W G ) 

= /^(X + W G + V G ) - h(X + W G ) - h(W G + W G ) + /i(W G ) 

> ^/i(X G + W G + V G ) - /i(X* + W G ) - /i(W G + W G ) + h(W G ) 
= ^/i(X G + W G + V G ) - fc(X* + W G ) - /i(W G + W G ) + fc(W e ) 
= pfc(X*+V e )-/i(X*+W ), 



/(Yi;Y 2 |Y 3 



/»(Yi|Y 3 )-/i(Yi|Y 2 ,Y 3 ) 

~ log (27re) n |S nin I - i log (2vre) n |E nli , 2 | 

i log (2^e) n |E n - S n S- 1 S n I - i log (2vre) n |S 
i log (2^e) n |s x , - E x , (E x , + + E^ 1 E x , 
-- log (27re) n |e x . - E x . (£ x , + E^)" 1 £ x , 
^ log (27re) n |£ x . | | J - (£ x . + E* + E^)" 1 E x . 
-i log (27re) n |E X . | | J - (E x . + E^) _1 E x . 
i log (2^e) n | J - (E x . + E* + S ri ,) _1 E x , 
-i log (2ire) n \l - (E x , + E^.)' 1 E x , 
ilog(2^e) n |/-(E x -+E ir )" 1 E x , 
-i log (27re) n |/ - (E x , + E^)" 1 E x . 




= 0. 



(202) 
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and the proof is completed. 
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